17 research outputs found
Pose-Normalized Image Generation for Person Re-identification
Person Re-identification (re-id) faces two major challenges: the lack of
cross-view paired training data and learning discriminative identity-sensitive
and view-invariant features in the presence of large pose variations. In this
work, we address both problems by proposing a novel deep person image
generation model for synthesizing realistic person images conditional on the
pose. The model is based on a generative adversarial network (GAN) designed
specifically for pose normalization in re-id, thus termed pose-normalization
GAN (PN-GAN). With the synthesized images, we can learn a new type of deep
re-id feature free of the influence of pose variations. We show that this
feature is strong on its own and complementary to features learned with the
original images. Importantly, under the transfer learning setting, we show that
our model generalizes well to any new re-id dataset without the need for
collecting any training data for model fine-tuning. The model thus has the
potential to make re-id model truly scalable.Comment: 10 pages, 5 figure
Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline)
© 2018, Springer Nature Switzerland AG. Employing part-level features offers fine-grained information for pedestrian image description. A prerequisite of part discovery is that each part should be well located. Instead of using external resources like pose estimator, we consider content consistency within each part for precise part location. Specifically, we target at learning discriminative part-informed features for person retrieval and make two contributions. (i) A network named Part-based Convolutional Baseline (PCB). Given an image input, it outputs a convolutional descriptor consisting of several part-level features. With a uniform partition strategy, PCB achieves competitive results with the state-of-the-art methods, proving itself as a strong convolutional baseline for person retrieval. (ii) A refined part pooling (RPP) method. Uniform partition inevitably incurs outliers in each part, which are in fact more similar to other parts. RPP re-assigns these outliers to the parts they are closest to, resulting in refined parts with enhanced within-part consistency. Experiment confirms that RPP allows PCB to gain another round of performance boost. For instance, on the Market-1501 dataset, we achieve (77.4+4.2)% mAP and (92.3+1.5)% rank-1 accuracy, surpassing the state of the art by a large margin. Code is available at: https://github.com/syfafterzy/PCB_RPP
Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification
Unsupervised domain adaptive person Re-IDentification (ReID) is challenging
because of the large domain gap between source and target domains, as well as
the lackage of labeled data on the target domain. This paper tackles this
challenge through jointly enforcing visual and temporal consistency in the
combination of a local one-hot classification and a global multi-class
classification. The local one-hot classification assigns images in a training
batch with different person IDs, then adopts a Self-Adaptive Classification
(SAC) model to classify them. The global multi-class classification is achieved
by predicting labels on the entire unlabeled training set with the Memory-based
Temporal-guided Cluster (MTC). MTC predicts multi-class labels by considering
both visual similarity and temporal consistency to ensure the quality of label
prediction. The two classification models are combined in a unified framework,
which effectively leverages the unlabeled data for discriminative feature
learning. Experimental results on three large-scale ReID datasets demonstrate
the superiority of proposed method in both unsupervised and unsupervised domain
adaptive ReID tasks. For example, under unsupervised setting, our method
outperforms recent unsupervised domain adaptive methods, which leverage more
labels for training
Unsupervised Domain Adaptation in the Dissimilarity Space for Person Re-identification
Person re-identification (ReID) remains a challenging task in many real-word
video analytics and surveillance applications, even though state-of-the-art
accuracy has improved considerably with the advent of deep learning (DL) models
trained on large image datasets. Given the shift in distributions that
typically occurs between video data captured from the source and target
domains, and absence of labeled data from the target domain, it is difficult to
adapt a DL model for accurate recognition of target data. We argue that for
pair-wise matchers that rely on metric learning, e.g., Siamese networks for
person ReID, the unsupervised domain adaptation (UDA) objective should consist
in aligning pair-wise dissimilarity between domains, rather than aligning
feature representations. Moreover, dissimilarity representations are more
suitable for designing open-set ReID systems, where identities differ in the
source and target domains. In this paper, we propose a novel
Dissimilarity-based Maximum Mean Discrepancy (D-MMD) loss for aligning
pair-wise distances that can be optimized via gradient descent. From a person
ReID perspective, the evaluation of D-MMD loss is straightforward since the
tracklet information allows to label a distance vector as being either
within-class or between-class. This allows approximating the underlying
distribution of target pair-wise distances for D-MMD loss optimization, and
accordingly align source and target distance distributions. Empirical results
with three challenging benchmark datasets show that the proposed D-MMD loss
decreases as source and domain distributions become more similar. Extensive
experimental evaluation also indicates that UDA methods that rely on the D-MMD
loss can significantly outperform baseline and state-of-the-art UDA methods for
person ReID without the common requirement for data augmentation and/or complex
networks.Comment: 14 pages (16 pages with references), 7 figures, conference ECC
Semantically Selective Augmentation for Deep Compact Person Re-Identification
We present a deep person re-identification approach that combines
semantically selective, deep data augmentation with clustering-based network
compression to generate high performance, light and fast inference networks. In
particular, we propose to augment limited training data via sampling from a
deep convolutional generative adversarial network (DCGAN), whose discriminator
is constrained by a semantic classifier to explicitly control the domain
specificity of the generation process. Thereby, we encode information in the
classifier network which can be utilized to steer adversarial synthesis, and
which fuels our CondenseNet ID-network training. We provide a quantitative and
qualitative analysis of the approach and its variants on a number of datasets,
obtaining results that outperform the state-of-the-art on the LIMA dataset for
long-term monitoring in indoor living spaces
NOVA: rendering virtual worlds with humans for computer vision tasks
Today, the cutting edge of computer vision research greatly depends on the availability of large datasets, which are critical for effectively training and testing new methods. Manually annotating visual data, however, is not only a labor-intensive process but also prone to errors. In this study, we present NOVA, a versatile framework to create realistic-looking 3D rendered worlds containing procedurally generated humans with rich pixel-level ground truth annotations. NOVA can simulate various environmental factors such as weather conditions or different times of day, and bring an exceptionally diverse set of humans to life, each having a distinct body shape, gender and age. To demonstrate NOVA's capabilities, we generate two synthetic datasets for person tracking. The first one includes 108 sequences, each with different levels of difficulty like tracking in crowded scenes or at nighttime and aims for testing the limits of current state-of-the-art trackers. A second dataset of 97 sequences with normal weather conditions is used to show how our synthetic sequences can be utilized to train and boost the performance of deep-learning based trackers. Our results indicate that the synthetic data generated by NOVA represents a good proxy of the real-world and can be exploited for computer vision tasks
Robust Pedestrian Detection for Semi-automatic Construction of A Crowded Person Re-Identification Dataset
The problem of re-identification of people in a crowd com- monly arises in real application scenarios, yet it has received less atten- tion than it deserves. To facilitate research focusing on this problem, we have embarked on constructing a new person re-identification dataset with many instances of crowded indoor and outdoor scenes. This paper proposes a two-stage robust method for pedestrian detection in a complex crowded background to provide bounding box annotations. The first stage is to generate pedestrian proposals using Faster R-CNN and locate each pedestrian using Non-maximum Suppression (NMS). Candidates in dense proposal regions are merged to identify crowd patches. We then apply a bottom-up human pose estimation method to detect individual pedestrians in the crowd patches. The locations of all subjects are achieved based on the bounding boxes from the two stages. The identity of the detected subjects throughout each video is then automatically annotated using multiple features and spatial-temporal clues. The experimental results on a crowded pedestrians dataset demonstrate the effectiveness and efficiency of the proposed method
Unsupervised Domain Adaptation with Noise Resistible Mutual-Training for Person Re-identification
© 2020, Springer Nature Switzerland AG. Unsupervised domain adaptation (UDA) in the task of person re-identification (re-ID) is highly challenging due to large domain divergence and no class overlap between domains. Pseudo-label based self-training is one of the representative techniques to address UDA. However, label noise caused by unsupervised clustering is always a trouble to self-training methods. To depress noises in pseudo-labels, this paper proposes a Noise Resistible Mutual-Training (NRMT) method, which maintains two networks during training to perform collaborative clustering and mutual instance selection. On one hand, collaborative clustering eases the fitting to noisy instances by allowing the two networks to use pseudo-labels provided by each other as an additional supervision. On the other hand, mutual instance selection further selects reliable and informative instances for training according to the peer-confidence and relationship disagreement of the networks. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art UDA methods for person re-ID